2,487 research outputs found
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
This paper proposes a source-filter-based generative adversarial neural
vocoder named SF-GAN, which achieves high-fidelity waveform generation from
input acoustic features by introducing F0-based source excitation signals to a
neural filter framework. The SF-GAN vocoder is composed of a source module and
a resolution-wise conditional filter module and is trained based on generative
adversarial strategies. The source module produces an excitation signal from
the F0 information, then the resolution-wise convolutional filter module
combines the excitation signal with processed acoustic features at various
temporal resolutions and finally reconstructs the raw waveform. The
experimental results show that our proposed SF-GAN vocoder outperforms the
state-of-the-art HiFi-GAN and Fre-GAN in both analysis-synthesis (AS) and
text-to-speech (TTS) tasks, and the synthesized speech quality of SF-GAN is
comparable to the ground-truth audio.Comment: Accepted by NCMMSC 202
Long-frame-shift Neural Speech Phase Prediction with Spectral Continuity Enhancement and Interpolation Error Compensation
Speech phase prediction, which is a significant research focus in the field
of signal processing, aims to recover speech phase spectra from
amplitude-related features. However, existing speech phase prediction methods
are constrained to recovering phase spectra with short frame shifts, which are
considerably smaller than the theoretical upper bound required for exact
waveform reconstruction of short-time Fourier transform (STFT). To tackle this
issue, we present a novel long-frame-shift neural speech phase prediction
(LFS-NSPP) method which enables precise prediction of long-frame-shift phase
spectra from long-frame-shift log amplitude spectra. The proposed method
consists of three stages: interpolation, prediction and decimation. The
short-frame-shift log amplitude spectra are first constructed from
long-frame-shift ones through frequency-by-frequency interpolation to enhance
the spectral continuity, and then employed to predict short-frame-shift phase
spectra using an NSPP model, thereby compensating for interpolation errors.
Ultimately, the long-frame-shift phase spectra are obtained from
short-frame-shift ones through frame-by-frame decimation. Experimental results
show that the proposed LFS-NSPP method can yield superior quality in predicting
long-frame-shift phase spectra than the original NSPP model and other
signal-processing-based phase estimation algorithms.Comment: Published at IEEE Signal Processing Letter
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
Phase information has a significant impact on speech perceptual quality and
intelligibility. However, existing speech enhancement methods encounter
limitations in explicit phase estimation due to the non-structural nature and
wrapping characteristics of the phase, leading to a bottleneck in enhanced
speech quality. To overcome the above issue, in this paper, we proposed
MP-SENet, a novel Speech Enhancement Network which explicitly enhances
Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec
architecture in which the encoder and decoder are bridged by time-frequency
Transformers along both time and frequency dimensions. The encoder aims to
encode time-frequency representations derived from the input distorted
magnitude and phase spectra. The decoder comprises dual-stream magnitude and
phase decoders, directly enhancing magnitude and wrapped phase spectra by
incorporating a magnitude estimation architecture and a phase parallel
estimation architecture, respectively. To train the MP-SENet model effectively,
we define multi-level loss functions, including mean square error and
perceptual metric loss of magnitude spectra, anti-wrapping loss of phase
spectra, as well as mean square error and consistency loss of short-time
complex spectra. Experimental results demonstrate that our proposed MP-SENet
excels in high-quality speech enhancement across multiple tasks, including
speech denoising, dereverberation, and bandwidth extension. Compared to
existing phase-aware speech enhancement methods, it successfully avoids the
bidirectional compensation effect between the magnitude and phase, leading to a
better harmonic restoration. Notably, for the speech denoising task, the
MP-SENet yields a state-of-the-art performance with a PESQ of 3.60 on the
public VoiceBank+DEMAND dataset.Comment: Submmited to IEEE Transactions on Audio, Speech and Language
Processin
Learning Probabilistic Coordinate Fields for Robust Correspondences
We introduce Probabilistic Coordinate Fields (PCFs), a novel
geometric-invariant coordinate representation for image correspondence
problems. In contrast to standard Cartesian coordinates, PCFs encode
coordinates in correspondence-specific barycentric coordinate systems (BCS)
with affine invariance. To know \textit{when and where to trust} the encoded
coordinates, we implement PCFs in a probabilistic network termed PCF-Net, which
parameterizes the distribution of coordinate fields as Gaussian mixture models.
By jointly optimizing coordinate fields and their confidence conditioned on
dense flows, PCF-Net can work with various feature descriptors when quantifying
the reliability of PCFs by confidence maps. An interesting observation of this
work is that the learned confidence map converges to geometrically coherent and
semantically consistent regions, which facilitates robust coordinate
representation. By delivering the confident coordinates to keypoint/feature
descriptors, we show that PCF-Net can be used as a plug-in to existing
correspondence-dependent approaches. Extensive experiments on both indoor and
outdoor datasets suggest that accurate geometric invariant coordinates help to
achieve the state of the art in several correspondence problems, such as sparse
feature matching, dense image registration, camera pose estimation, and
consistency filtering. Further, the interpretable confidence map predicted by
PCF-Net can also be leveraged to other novel applications from texture transfer
to multi-homography classification.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Adiponectin improves coronary no-reflow injury by protecting the endothelium in rats with type 2 diabetes mellitus.
To determine the effect of adiponectin (APN) on the coronary no-reflow (NR) injury in rats with Type 2 diabetes mellitus (T2DM), 80 male Sprague-Dawley rats were fed with a high-sugar-high-fat diet to build a T2DM model. Rats received vehicle or APN in the last week and then were subjected to myocardial ischemia reperfusion (MI/R) injury. Endothelium-dependent vasorelaxation of the thoracic aorta was significantly decreased and serum levels of endothelin-1 (ET-1), intercellular cell adhesion molecule-1 (ICAM-1) and vascular cell adhesion molecule-1 (VCAM-1) were noticably increased in T2DM rats compared with rats without T2DM. Serum APN was positively correlated with the endothelium-dependent vasorelaxation, but negatively correlated with the serum level of ET-1. Treatment with APN improved T2DM-induced endothelium-dependent vasorelaxation, recovered cardiac function, and decreased both NR size and the levels of ET-1, ICAM-1 and VCAM-1. Hypoadiponectinemia was associated with the aggravation of coronary NR in T2DM rats. APN could alleviate coronary NR injury in T2DM rats by protecting the endothelium and improving microcirculation
Combination Therapy With Fingolimod and Neural Stem Cells Promotes Functional Myelination
Myelination, which occurs predominantly postnatally and continues throughout life, is important for proper neurologic function of the mammalian central nervous system (CNS). We have previously demonstrated that the combination therapy of fingolimod (FTY720) and transplanted neural stem cells (NSCs) had a significantly enhanced therapeutic effect on the chronic stage of experimental autoimmune encephalomyelitis, an animal model of CNS autoimmunity, compared to using either one of them alone. However, reduced disease severity may be secondary to the immunomodulatory effects of FTY720 and NSCs, while whether this therapy directly affects myelinogenesis remains unknown. To investigate this important question, we used three myelination models under minimal or non-inflammatory microenvironments. Our results showed that FTY720 drives NSCs to differentiate into oligodendrocytes and promotes myelination in an ex vivo brain slice culture model, and in the developing CNS of healthy postnatal mice in vivo. Elevated levels of neurotrophic factors, e.g., brain-derived neurotrophic factor and glial cell line-derived neurotrophic factor, were observed in the CNS of the treated infant mice. Further, FTY720 and NSCs efficiently prolonged the survival and improved sensorimotor function of shiverer mice. Together, these data demonstrate a direct effect of FTY720, beyond its known immunomodulatory capacity, in NSC differentiation and myelin development as a novel mechanism underlying its therapeutic effect in demyelinating diseases
- …